Bioinformatics of Brain Diseases

201

microarrays is the elevated cost for each experiment and the growing number of

probe designs that utilize low-specificity sequences [13]. These disadvantages

propelled researchers to come up with a sequence-based technique: RNA-seq.

Nevertheless, it is wise to use microarrays when there are a large number of

samples and cost is an issue or if you wish to directly compare the expression

profiles with data from another microarray platform.

8.2.2

RNA-seq Technologies

RNA sequencing (RNA-seq) is a technique that is being used to detect and

quantify mRNA molecules in a biological sample consisting of millions of cells

[14]. It uses high throughput sequencing to not only quantify gene expression

but also to determine alternatively spliced genes and detect allele specific

expression and more. RNA-seq may be applied to various types of RNA such

as mRNA, total RNA, microRNA, single cell RNA and long noncoding RNA

[15]. With RNA-seq, firstly the RNA is isolated and converted to cDNA.

Next, a sequencing library is prepared following a PCR amplification. The

cDNA is fragmented into short pieces, and finally sequencing is done using

an NGS (Next Generation Sequencing) platform (Figure 8.1B). Following the

production of sequence reads in FASTQ format, a reference sequence is then

used to align the reads [16].

There are several NGS platforms with Illumina (www.illumina.com) being

the most popular one.

Other major platforms can be listed as Roche 454

(www.424.cm), Pacific Biosciences (www.pacificbiosciences.com), Ion Torrent

(www.iontorrent.com), and SOLID (www.invitrogen.com). These platforms

differ in terms of sequencing and detection chemistry. Each NGS platform

has its own protocol. Selection of a platform may depend on the level of accu-

racy needed, the number and the length of the reads, whether RNA or DNA is

sequenced, amount of sample material, cost of the job, and the amount of time

needed to get the job done [17]. RNA-seq is an intricate, interwoven process

which involves steps such as PCR amplification, fragmentation, purification,

and sequencing. Any error in any of these stages could make the data unreli-

able. Which is why quality control (QC) is an important aspect of RNA-seq.

QC of RNAs is a critical step prior to library preparation. To obtain high

quality RNA, it is essential to stabilize the sample after collection, fully lyse

it, and eliminate any potential DNA contamination. Furthermore, RNA-seq

data of poor quality can dramatically bias the outcomes of analysis and result

in false conclusions. Additionally, biases such as GC-content (guanine-cytosine

content) and nucleotide composition and complexity of the transcriptome can

also cause flawed data [18]. Rigorous QC methods must be applied to the raw

data before any downstream analysis [19].

Unlike hybridization-based methods RNA-seq uses sequence-based ap-

proaches to determine the transcripts directly. Alternative splicing may be

detected if aligned to the genome. Furthermore, SNPs and paralogous genes

can be identified with this technology. The background noise is relatively